AITopics | stochastic gradient descent algorithm

Collaborating Authors

stochastic gradient descent algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reinforcement Learning under Model Mismatch

Aurko Roy, Huan Xu, Sebastian Pokutta

Neural Information Processing SystemsNov-21-2025, 11:32:39 GMT

We scale up the robust algorithms to large MDPs via function approximation and prove convergence under two different settings.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A stochastic gradient descent algorithm with random search directions

Gbaguidi, Eméric

arXiv.org Machine LearningApr-1-2025

Firstly, we introduce some notations. Secondly, we shall formulate our new class of SGD algorithms with random search directions which includes the SCGD algorithm. Finally, we will spell out some regularity assumptions. The SCGD algorithm as given in (3), represents the practical definition by considering the vectors in the canonical basis of R d . However, we can extend this coordinate selecting rule by using more general random vectors with a possible adaptive sampling policy. Therefore, we introduce the Stochastic Coordinate Gradient Descent algorithm with Random Search Direction (SCORS) defined for all n 1, by X n +1= X n γ nD (V n +1) f U n+1( X n), (SCORS) where the initial state X 1 is a squared integrable random vector of R d which can be arbitrarily chosen, D (v) = vv T for any vector v R d, the sequence (U n) is independent from (X n) where ( U n) is independent and identically distributed with U ([ [1, N ] ])distribution and V n is a random vector of R d sampled from an underlying distribution P n satisfying certain conditions (see Assumption 1 below). Furthermore, we 3 assume that V n +1is independent from U n +1conditionally on F n, where F n = σ (X 1, . . .

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

2503.19942

Country: Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Reinforcement Learning under Model Mismatch

Aurko Roy, Huan Xu, Sebastian Pokutta

Neural Information Processing SystemsOct-3-2024, 19:40:45 GMT

We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs of [1, 15, 11] to the model-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define robust versions of Q-learning, SARSA, and TD-learning and prove convergence to an approximately optimal robust policy and approximate value function respectively. We scale up the robust algorithms to large MDPs via function approximation and prove convergence under two different settings. We prove convergence of robust approximate policy iteration and robust approximate value iteration for linear architectures (under mild assumptions). We also define a robust loss function, the mean squared robust projected Bellman error and give stochastic gradient descent algorithms that are guaranteed to converge to a local minimum.

algorithm, convergence, function approximation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

On the convergence of loss and uncertainty-based active learning algorithms

Haimovich, Daniel, Karamshuk, Dima, Linder, Fridolin, Tax, Niek, Vojnovic, Milan

arXiv.org Artificial IntelligenceDec-21-2023

We study convergence rates of loss and uncertainty-based active learning algorithms under various assumptions. First, we provide a set of conditions under which a convergence rate guarantee holds, and use this for linear classifiers and linearly separable datasets to show convergence rate guarantees for loss-based sampling and different loss functions. Second, we provide a framework that allows us to derive convergence rate bounds for loss-based sampling by deploying known convergence rate bounds for stochastic gradient descent algorithms. Third, and last, we propose an active learning algorithm that combines sampling of points and stochastic Polyak's step size. We show a condition on the sampling that ensures a convergence rate guarantee for this algorithm for smooth convex loss functions. Our numerical results demonstrate efficiency of our proposed algorithm.

algorithm, loss function, step size, (13 more...)

arXiv.org Artificial Intelligence

2312.13927

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

Vu-Quoc, Loc, Humer, Alexander

arXiv.org Artificial IntelligenceJun-19-2023

Three recent breakthroughs due to AI in arts and science serve as motivation: An award winning digital image, protein folding, fast matrix multiplication. Many recent developments in artificial neural networks, particularly deep learning (DL), applied and relevant to computational mechanics (solid, fluids, finite-element technology) are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. Hybrid methods combine traditional PDE discretizations with ML methods either (1) to help model complex nonlinear constitutive relations, (2) to nonlinearly reduce the model order for efficient simulation (turbulence), or (3) to accelerate the simulation by predicting certain components in the traditional integration methods. Here, methods (1) and (2) relied on Long-Short-Term Memory (LSTM) architecture, with method (3) relying on convolutional neural networks. Pure ML methods to solve (nonlinear) PDEs are represented by Physics-Informed Neural network (PINN) methods, which could be combined with attention mechanism to address discontinuous solutions. Both LSTM and attention architectures, together with modern and generalized classic optimizers to include stochasticity for DL networks, are extensively reviewed. Kernel machines, including Gaussian processes, are provided to sufficient depth for more advanced works such as shallow networks with infinite width. Not only addressing experts, readers are assumed familiar with computational mechanics, but not with DL, whose concepts and applications are built up from the basics, aiming at bringing first-time learners quickly to the forefront of research. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics, even in well-known references. Positioning and pointing control of a large-deformable beam is given as an example.

artificial intelligence, dual-porosity dual-permeability governing equation, survey article, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.32604/cmes.2023.028130

2212.08989

Country:

North America > United States > California (0.67)
North America > United States > Illinois (0.45)
Asia > South Korea (0.45)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
(2 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)
(26 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Parallelized Stochastic Gradient Descent

Neural Information Processing SystemsApr-6-2023, 13:34:22 GMT

With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms our variant comes with parallel acceleration guarantees and it poses no overly tight latency constraints, which might only be available in the multicore setting. Our analysis introduces a novel proof technique --- contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits. As a side effect this answers the question of how quickly stochastic gradient descent algorithms reach the asymptotically normal regime.

parallelized stochastic gradient descent, stochastic gradient descent algorithm

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Dive Into Deep Learning -- Part 2. This is part 2 of my summary of the…

#artificialintelligenceFeb-26-2023, 08:15:08 GMT

The naive approach: Take the derivative of the loss function which is an average of the losses calculated on every example in the dataset, a full update is powerful but it has some drawbacks… Drawbacks: . Can be extremely slow as we need to pass over the entire dataset to make a single update. . If there is a lot of redundancy in the training data, the benefit of a full update is very low The extreme approach Consider only a single example at a time and update steps based on one observation at a time, does that remind you of something?? Yes, it's the stochastic gradient descent algorithm or SGD. It can be effective even in large datasets but it also has some drawbacks… Drawbacks: . It can take longer to process one sample at a time compared to a full batch .

dataset, drawback, full update, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Better Methods and Theory for Federated Learning: Compression, Client Selection and Heterogeneity

Horváth, Samuel

arXiv.org Machine LearningJul-1-2022

Federated learning (FL) is an emerging machine learning paradigm involving multiple clients, e.g., mobile phone devices, with an incentive to collaborate in solving a machine learning problem coordinated by a central server. FL was proposed in 2016 by Kone\v{c}n\'{y} et al. and McMahan et al. as a viable privacy-preserving alternative to traditional centralized machine learning since, by construction, the training data points are decentralized and never transferred by the clients to a central server. Therefore, to a certain degree, FL mitigates the privacy risks associated with centralized data collection. Unfortunately, optimization for FL faces several specific issues that centralized optimization usually does not need to handle. In this thesis, we identify several of these challenges and propose new methods and algorithms to address them, with the ultimate goal of enabling practical FL solutions supported with mathematically rigorous guarantees.

artificial intelligence, machine learning, top-5 and rand-5 compression operator, (19 more...)

arXiv.org Machine Learning

2207.00392

Country:

North America > United States > California (0.13)
Asia > Middle East > Jordan (0.04)
Europe > Russia (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SignSGD: Fault-Tolerance to Blind and Byzantine Adversaries

Akoun, Jason, Meyer, Sebastien

arXiv.org Machine LearningFeb-7-2022

Distributed learning has become a necessity for training ever-growing models by sharing calculation among several devices. However, some of the devices can be faulty, deliberately or not, preventing the proper convergence. As a matter of fact, the baseline distributed SGD algorithm does not converge in the presence of one Byzantine adversary. In this article we focus on the more robust SignSGD algorithm derived from SGD. We provide an upper bound for the convergence rate of SignSGD proving that this new version is robust to Byzantine adversaries. We implemented SignSGD along with Byzantine strategies attempting to crush the learning process. Therefore, we provide empirical observations from our experiments to support our theory. Our code is available on GitHub https://github.com/jasonakoun/signsgd-fault-tolerance and our experiments are reproducible by using the provided parameters.

adversary, byzantine adversary, signsgd, (16 more...)

arXiv.org Machine Learning

2202.02085

Country: Europe > France (0.04)

Genre: Research Report (0.66)

Technology: